AFC Data Explorer

The Age Friendly Communities (AFC) project requires several key datasets (pulled from a variety of data sources) to answer the data questions that are part of it. This document is intended as a guide to the aggregated AFC dataset as well as a record of any analysis and visualizations applied to this dataset in answering the data questions.

Overview

The exploratory analysis detailed in the rest of the document attempt to answer questions under Data Questions. These questions and additional contextual information can be found in this document. Details of the analysis along with visualizations pertaining to each data question are under Analysis. A summary of the insights drawn from the analysis including notes on additional work needed to be carried out and caveats (if any) can be found under Conclusions.

Data Questions

A. RCFE Capacity

What is the number of RCFEs in a given community as defined above?
What is the capacity (by licensed bed) in a given community?
How many of the RCFEs in a given community participate in the Assisted Living Waiver Program (ALWP)?

B. 65+ Population

In each community, what is the current population of 65+?
Can population estimates for the 65+ population in 2030 be located for each community?

C. San Diego’s Alzheimer’s Population

Based on current and future projections for San Diego residents with ADOD,how does the current RCFE capacity compare?

D. Low-Income Seniors

Identify %s of low-income seniors in each given community.
What is the RCFE capacity in areas with greater % of low-income 65+ residents?
How many RCFEs are participating in the ALWP in areas with greater % of low-income 65+ residents?
Are there less RCFEs (in terms of number & various licensed bed capacity) in communities with greater % of low-income 65+?
Do communities with greater % of low-income 65+ have higher prevalence of ADOD?

E. Race

Are there less RCFEs (in terms of number & various licensed bed capacity) in communities with greater % of certain minority populations?
Are communities with greater % of certain minority populations less likely to be able to afford RCFE services?
Do communities with greater % of certain minority populations have higher prevalence of ADOD?



In [13]:

    
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import pylab, mlab, gridspec
from IPython.core.pylabtools import figsize, getfigs
from IPython.display import display, HTML
from pylab import *

# GLOBALS

# working directory
CWD = os.getcwd()

# Data File(s)
VERSION = "20170321"
DATAFILE = "afc_" + VERSION + ".csv"

datapath = os.path.join(CWD,DATAFILE)

# read the data file into a data frame
df = pd.read_csv(datapath)

Data Summary

The AFC dataset was generated using data (wholly or partially) from a number of disparate data sources aggregated along the following geographical IDs (where possible) specific to the San Diego county:

SRA
Zipcode
ZCTA

For a list of the individual data sources that contributed to the AFC aggregated dataset please see this resource.



In [14]:

    
# a custom definition of an info method callable on a series
def sinfo(x):
    xu = x.unique()
    l = len(x)
    typ = str(np.array(xu.tolist()).dtype) + " "
    nans = "null: " + str(l - x.count()) + " "
    lgt = "len: " + str(l) + " "
    unq = "unq: " + str(len(xu))
    return (typ+nans+lgt+unq)

# function to output summary information (dimensions,data types among others) for 
# the specified dataframe
def summarize(df):
    # output the number of rows and cols of the data frame
	print "Dimensions: {}".format(df.shape)
    # print the variables, the number of unique values and a list of unique values for each col
	print "{}".format(df.apply(lambda x:(sinfo(x),x.unique())))
    
summarize(df)

df.head()









    



Dimensions: (158, 26)
SRA                           (|S23 null: 0 len: 158 unq: 41, [Vista, Laguna...
Region                        (|S13 null: 0 len: 158 unq: 6, [North Coastal,...
Zipcode                       (int64 null: 0 len: 158 unq: 116, [92081, 9208...
ZCTA                          (int64 null: 0 len: 158 unq: 114, [92081, 9208...
NumRCFELicensed               (int64 null: 0 len: 158 unq: 35, [7, 29, 43, 0...
NumRCFEBedsLicensed           (float64 null: 0 len: 158 unq: 85, [174.0, 153...
NumRCFEPending                (int64 null: 0 len: 158 unq: 9, [0, 2, 1, 7, 8...
NumRCFEBedsPending            (float64 null: 0 len: 158 unq: 23, [0.0, 18.0,...
NumRCFEInALWP                 (int64 null: 0 len: 158 unq: 9, [1, 0, 2, 5, 6...
2012Pop65Over                 (int64 null: 0 len: 158 unq: 38, [0, 10527, 94...
2012Pop55Over                 (int64 null: 0 len: 158 unq: 38, [0, 21056, 20...
2030Pop65Over                 (int64 null: 0 len: 158 unq: 37, [0, 18489, 19...
2030Pop55Over                 (int64 null: 0 len: 158 unq: 38, [0, 30059, 30...
2012PopADOD55Over             (|S4 null: 0 len: 158 unq: 37, [0, 2313, 5, 13...
2030PopADOD55Over             (int64 null: 0 len: 158 unq: 36, [0, 3165, 11,...
2012PopLowIncome55Over        (int64 null: 0 len: 158 unq: 40, [1535, 0, 113...
2012PopLowIncome65Over        (int64 null: 0 len: 158 unq: 39, [860, 0, 47, ...
2012PopMinority               (int64 null: 0 len: 158 unq: 39, [0, 55995, 14...
2012ADODPerRCFE               (float64 null: 0 len: 158 unq: 32, [0.0, 53.79...
2030ADODPerRCFE               (float64 null: 0 len: 158 unq: 33, [0.0, 73.6,...
2012PercentLowIncome65Over    (float64 null: 0 len: 158 unq: 36, [0.0, 8.17,...
2012PercentLowIncome55Over    (float64 null: 0 len: 158 unq: 36, [0.0, 7.29,...
2012LowIncome65OverPerRCFE    (float64 null: 0 len: 158 unq: 37, [0.0, 20.0,...
PopMinorityPerRCFE            (float64 null: 0 len: 158 unq: 34, [0.0, 1302....
2012MedianHHIncome            (int64 null: 0 len: 158 unq: 41, [55071, 0, 44...
2012MedianHHIncome65Over      (int64 null: 0 len: 158 unq: 38, [45104, 0, 43...
dtype: object






    Out[14]:






  
    
      
      SRA
      Region
      Zipcode
      ZCTA
      NumRCFELicensed
      NumRCFEBedsLicensed
      NumRCFEPending
      NumRCFEBedsPending
      NumRCFEInALWP
      2012Pop65Over
      ...
      2012PopLowIncome65Over
      2012PopMinority
      2012ADODPerRCFE
      2030ADODPerRCFE
      2012PercentLowIncome65Over
      2012PercentLowIncome55Over
      2012LowIncome65OverPerRCFE
      PopMinorityPerRCFE
      2012MedianHHIncome
      2012MedianHHIncome65Over
    
  
  
    
      0
      Vista
      North Coastal
      92081
      92081
      7
      174.0
      0
      0.0
      1
      0
      ...
      860
      0
      0.00
      0.0
      0.00
      0.00
      0.0
      0.00
      55071
      45104
    
    
      1
      Vista
      North Coastal
      92083
      92083
      7
      153.0
      0
      0.0
      0
      0
      ...
      0
      0
      0.00
      0.0
      0.00
      0.00
      0.0
      0.00
      0
      0
    
    
      2
      Vista
      North Coastal
      92084
      92084
      29
      441.0
      2
      18.0
      0
      0
      ...
      0
      0
      0.00
      0.0
      0.00
      0.00
      0.0
      0.00
      0
      0
    
    
      3
      Vista
      North Coastal
      0
      0
      43
      768.0
      2
      18.0
      1
      10527
      ...
      860
      55995
      53.79
      73.6
      8.17
      7.29
      20.0
      1302.21
      55071
      45104
    
    
      4
      Laguna-Pine Valley
      East
      91916
      91916
      0
      0.0
      0
      0.0
      0
      0
      ...
      47
      0
      0.00
      0.0
      0.00
      0.00
      0.0
      0.00
      44969
      43952
    
  

5 rows × 26 columns

NOTES

San Diego county has 6 unique Regions, 41 SRAs and a 100+ Zipcodes and ZCTAs each. Each Region contains one or more SRAs which in turn are comprised of one or more Zipcodes. Zipcodes and ZCTAs are mostly synonomous with a rare ZCTA overlapping more than one Zipcode.
All data is available on a per SRA basis while some data are available on a per zipcode basis. Some data (e.g.: population estimates) have intentionally been downloaded at the SRA level since SRA was the lowest common denominator for most datasets.
Rows with zipcode/zcta set to 0 represent the SRA-wide data; such data are aggregates of per zipcode data for a given SRA.

Visualizations

Below visualizations are here as an aid to explore the dataset further prior to any additional analysis.



In [15]:

    
# group by Region
df_g_region = df.groupby('Region')

# compute counts for key indicators for each group
indices = []
cols = ['NumRCFELicensed','NumRCFEBedsLicensed','NumRCFEInALWP','2012Pop65Over','2030Pop65Over',
        '2012PopMinority','2012MedianHHIncome','2012MedianHHIncome65Over','2012PopLowIncome65Over']
# create a 2-D array to hold information for each group
dim = (len(df_g_region.groups),len(cols))
data = np.zeros(dim, dtype=np.float)
pos = 0

for region, group in df_g_region:
    
    indices.append(region)
    
    sra_data = group[group['Zipcode'] == 0]
    #display(sra_data)
    
    sra_data_sum = sra_data[cols].sum(axis=0).to_frame().T
    #display(sra_data_sum)
        
    data[pos] = sra_data_sum.values
    pos = pos + 1

df_region = pd.DataFrame(data=data,columns=cols,index=indices)
df_region = df_region.apply(lambda x: x.astype(int))
display(df_region) 

# convert population and income data to kilo units
df_region[cols[3:]] = df_region[cols[3:]].apply(lambda x: x/1000)
df_region['NumRCFEBedsLicensed'] = df_region['NumRCFEBedsLicensed'].apply(lambda x: int(x/100))

# convert to long format for visualization
df_region = df_region.T
region_list = df_region.columns.tolist()









    






  
    
      
      NumRCFELicensed
      NumRCFEBedsLicensed
      NumRCFEInALWP
      2012Pop65Over
      2030Pop65Over
      2012PopMinority
      2012MedianHHIncome
      2012MedianHHIncome65Over
      2012PopLowIncome65Over
    
  
  
    
      Central
      53
      1199
      3
      16488
      41996
      217067
      116920
      71453
      6421
    
    
      East
      115
      3397
      1
      58209
      108215
      170686
      714472
      507167
      9867
    
    
      North Central
      127
      2923
      7
      61500
      131679
      199803
      750232
      626076
      2973
    
    
      North Coastal
      117
      4610
      8
      63348
      117628
      195750
      310686
      201005
      3052
    
    
      North Inland
      172
      5145
      15
      77559
      139352
      251448
      732062
      748747
      6015
    
    
      South
      53
      2093
      1
      49953
      100009
      365047
      317281
      250268
      7243



In [12]:

    
# this configuration below prevents pylab from importing anything into the global namespace
# needed to prevent user warning about namespace clashes 
#%config InteractiveShellApp.pylab_import_all = False
#%pylab inline
%matplotlib inline

matplotlib.rcParams['font.size'] = 10
matplotlib.rcParams['font.weight'] = 'bold'

# plot the above in a bar graph 
width = 0.3
labels = ['NumRCFELicensed','NumRCFEBedsLicensed','NumRCFE (ALWP)','65+ (2012)','65+ (2030)','Minorities (2012)',
          'Med HH Income','Med HH Income (65+)','Low Income (65+)']
pos = np.arange(len(labels))
x_ticks_pos = pos + (0.15 * width)

num_regions = len(region_list)
fig, axes = plt.subplots(num_regions, 1, figsize=(15, 25))

for i,region in enumerate(region_list):
    
    plot_vals = df_region[region]
    bars = axes[i].bar(pos,plot_vals,width,color='#00688B',alpha=0.5)
    
    for j, bar in enumerate(bars):
        ht = bar.get_height()
        bar_pos = bar.get_x() + bar.get_width()/2.
        ht = ((ht+1) * -50) if plot_vals[j] < 0 else ht+10
        txt = '{:.0f}K'.format(plot_vals[j]) if j>2 else '{:.0f}'.format(plot_vals[j])
        txt = '{:.0f} (x 100)'.format(plot_vals[j]) if j==2 else txt
        axes[i].text(bar_pos, ht, txt, ha='center', va='bottom')

    axes[i].spines['top'].set_visible(False)
    axes[i].spines['right'].set_visible(False)
        
    axes[i].set_xlim(min(pos)-width, max(pos)+width)
    axes[i].set_ylim([0, max(plot_vals)] )

    title = region
    axes[i].set_title(title,fontsize=14, fontweight='bold',color='#FF8C00')
    axes[i].title.set_position([.5,1.05])
    #axes[i].grid(True)
    
    if i == (num_regions-1):
        axes[i].set_xticks(x_ticks_pos)
        axes[i].set_xticklabels(labels,rotation=45,fontsize=11, fontweight='bold')
    else:
        axes[i].tick_params(axis='x',labelbottom='off')
        
    axes[i].margins(0.3, 0)    
    
plt.show()

Analysis

To be completed.

Conclusions

To be completed.

Tested for Python versions: Python 2.7.12 :: Anaconda custom (64-bit)

	SRA	Region	Zipcode	ZCTA	NumRCFELicensed	NumRCFEBedsLicensed	NumRCFEPending	NumRCFEBedsPending	NumRCFEInALWP	2012Pop65Over	...	2012PopLowIncome65Over	2012PopMinority	2012ADODPerRCFE	2030ADODPerRCFE	2012PercentLowIncome65Over	2012PercentLowIncome55Over	2012LowIncome65OverPerRCFE	PopMinorityPerRCFE	2012MedianHHIncome	2012MedianHHIncome65Over
0	Vista	North Coastal	92081	92081	7	174.0	0	0.0	1	0	...	860	0	0.00	0.0	0.00	0.00	0.0	0.00	55071	45104
1	Vista	North Coastal	92083	92083	7	153.0	0	0.0	0	0	...	0	0	0.00	0.0	0.00	0.00	0.0	0.00	0	0
2	Vista	North Coastal	92084	92084	29	441.0	2	18.0	0	0	...	0	0	0.00	0.0	0.00	0.00	0.0	0.00	0	0
3	Vista	North Coastal	0	0	43	768.0	2	18.0	1	10527	...	860	55995	53.79	73.6	8.17	7.29	20.0	1302.21	55071	45104
4	Laguna-Pine Valley	East	91916	91916	0	0.0	0	0.0	0	0	...	47	0	0.00	0.0	0.00	0.00	0.0	0.00	44969	43952

	NumRCFELicensed	NumRCFEBedsLicensed	NumRCFEInALWP	2012Pop65Over	2030Pop65Over	2012PopMinority	2012MedianHHIncome	2012MedianHHIncome65Over	2012PopLowIncome65Over
Central	53	1199	3	16488	41996	217067	116920	71453	6421
East	115	3397	1	58209	108215	170686	714472	507167	9867
North Central	127	2923	7	61500	131679	199803	750232	626076	2973
North Coastal	117	4610	8	63348	117628	195750	310686	201005	3052
North Inland	172	5145	15	77559	139352	251448	732062	748747	6015
South	53	2093	1	49953	100009	365047	317281	250268	7243